Thomas Lange: Removing tens of thousands of web pages
In January I've removed tens of thousands of web pages on
www.debian.org. Have you noticed it?
In the past
From 1997 onwards, we had web pages for security announcements. We
had to manually prepare a .data and a .wml file which then generated a
web page for each security announcement (DSA or DLA). We have listed
the 6 most recent messages in a short list that was created from these
files. Most of the work that went into the Debian web pages was
creating these files.
Our search engine often listed the pages with security announcements instead
of a more relevant web page for a particular topic.
Preparation
At DebConf Kosovo (2022) I started with a proof of concept and wrote a
script, that generates this list without using the .data/.wml files in
the Git repository, but instead reading the primary sources of
security information[1]. This new list now includes links to the
security tracker and the email of the announcement.
Following web pages and scripts were also using these .data and .wml
files:
- OVAL files
- RSS feeds for security announcements (and LTS)
- Apache config file for mapping URLs from dsa-NNN to YEAR/dsa-NNN
- A huge list of crossreferences between DSA and CVE numbers
German (de) 3501 28.5%
Italian (it) 1005 8.2%
Danish (da) 6336 51.7%
After
German (de) 1486 59.0%
Italian (it) 909 36.1%
Danish (da) 982 39.0%
Cleanup of all the security web pages
Finally in January, I could remove all web pages of the security announcements in
one git commit[5].
Using several git rm -rf
commands this commit
removed 54335 files, including around 9650
DSA/DLA data files, 44189 wml files, nearly 500 Makefiles.
Outcome
No more manual work is needed for the security team and we now have
direct links from a DSA-NNN/DLA-NNN to the email in our mailing list
archive. This was not possible before.
The search results became more accurate.
But we still host a lot of other old content on the Debian web pages
which may be removed in the future.
[1] https://www.debian.org/security/#infos
[2] https://www.debian.org/security/oval/
[3] https://salsa.debian.org/security-tracker-team/security-tracker/-/raw/master/data/DSA/list
[4] https://www.debian.org/devel/website/stats
[5] https://salsa.debian.org/webmaster-team/webwml/-/commit/2aa73ff15bfc4eb2afd85c